Privacy-Preserving Data Mining

نویسندگان

  • Alexandre Evfimievski
  • Tyrone Grandison
چکیده

Privacy-preserving data mining (PPDM) refers to the area of data mining that seeks to safeguard sensitive information from unsolicited or unsanctioned disclosure. Most traditional data mining techniques analyze and model the data set statistically, in aggregation, while privacy preservation is primarily concerned with protecting against disclosure individual data records. This domain separation points to the technical feasibility of PPDM. Historically, issues related to PPDM were first studied by the national statistical agencies interested in collecting private social and economical data, such as census and tax records, and making it available for analysis by public servants, companies, and researchers. Building accurate socioeconomical models is vital for business planning and public policy. Yet, there is no way of knowing in advance what models may be needed, nor is it feasible for the statistical agency to perform all data processing for everyone, playing the role of a trusted third party. Instead, the agency provides the data in a sanitized form that allows statistical processing and protects the privacy of individual records, solving a problem known as privacy-preserving data publishing. For a survey of work in statistical databases, see Adam and Wortmann (1989) and Willenborg and de Waal (2001). The term privacy-preserving data mining was introduced in the papers Agrawal and Srikant (2000) and Lindell and Pinkas (2000). These papers considered two fundamental problems of PPDM: privacy-preserving data collection and mining a data set partitioned across several private enterprises. Agrawal and Srikant devised a randomization algorithm that allows a large number of users to contribute their private records for efficient centralized data mining while limiting the disclosure of their values; Lindell and Pinkas invented a cryptographic protocol for decision tree construction over a data set horizontally partitioned between two parties. These methods were subsequently refined and extended by many researchers worldwide. Other areas that influence the development of PPDM include cryptography and secure multiparty computation (Goldreich, 2004; Stinson, 2006), database query auditing for disclosure detection and prevention (Dinur & Nissim, 2003; Kenthapadi, Mishra, & Nissim, 2005; Kleinberg, Papadimitriou, & Raghavan, 2000), database privacy and policy enforcement (Aggarwal et al., 2004; Agrawal, Kiernan, Srikant, & Xu 2002), database security (Castano, Fugini, Martella, & Samarati, 1995), and of course, specific application domains.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Privacy Preserving Data Mining

There is a tremendous increase in the research of data mining. Data mining is the process of extraction of data from large database. Knowledge Discovery in database (KDD) is another name of data mining. Privacy protection has become a necessary requirement in many data mining applications due to emerging privacy legislation and regulations. One of the most important topics in research community...

متن کامل

Survey on Privacy Preserving Data Mining

Data mining is the process of extraction of data from large database. One of the most important topics in research community is Privacy preserving data mining (PPDM). Privacy preserving data mining has become increasingly popular because it allows sharing of privacy sensitive data for analysis purposes. It is essential to maintain a ratio between privacy protection and knowledge discovery. To s...

متن کامل

A General Survey of Privacy-Preserving Data Mining Models and Algorithms

In recent years, privacy-preserving data mining has been studied extensively, because of the wide proliferation of sensitive information on the internet. A number of algorithmic techniques have been designed for privacy-preserving data mining. In this paper, we provide a review of the state-of-the-art methods for privacy. We discuss methods for randomization, k-anonymization, and distributed pr...

متن کامل

Privacy Preserving Data Mining Techniques: Challenges & Issues

Privacy preserving becomes an important issue in the development of various data mining techni'ques. In this paper, we have discussed various techniques to preserve privacy while mining data. In the absence of uniform framework across all data mining techniques, researchers have focused on data technique specific privacy preserving issue. Available framework and algorithms provide further insig...

متن کامل

Privacy Preserving Frequency Mining in 2-Part Fully Distributed Setting

Recently, privacy preservation has become one of the key issues in data mining. In many data mining applications, computing frequencies of values or tuples of values in a data set is a fundamental operation repeatedly used. Within the context of privacy preserving data mining, several privacy preserving frequency mining solutions have been proposed. These solutions are crucial steps in many pri...

متن کامل

Privacy-Preserving Data Mining: Development and Directions

This article first describes the privacy concerns that arise due to data mining, especially for national security applications. Then we discuss privacy-preserving data mining. In particular, we view the privacy problem as a form of inference problem and introduce the notion of privacy constraints. We also describe an approach for privacy constraint processing and discuss its relationship to pri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007